The CASH algorithm-cost-sensitive attribute selection using histograms

نویسندگان

  • Yael Weiss
  • Yuval Elovici
  • Lior Rokach
چکیده

Feature selection is an essential process for machine learning tasks since it improves generalization capabilities, and reduces run-time and amodel’s complexity. Inmany applications, the cost of collecting the features must be taken into account. To cope with the cost problem, we developed a new cost-sensitive fitness function based on histogram comparison. This function is integrated with a genetic search method to form a new feature selection algorithm termed CASH (cost-sensitive attribute selection algorithm using histograms). The CASH algorithm takes into account feature collection costs as well as feature grouping and misclassification costs. Our experiments in various domains demonstrated the superiority of CASH over several other cost-sensitive genetic algorithms. 2011 Elsevier Inc. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

A Multi-Mode Resource-Constrained Optimization of Time-Cost Trade-off Problems in Project Scheduling Using a Genetic Algorithm

In this paper, we present a genetic algorithm (GA) for optimization of a multi-mode resource constrained time cost trade off (MRCTCT) problem. The proposed GA, each activity has several operational modes and each mode identifies a possible executive time and cost of the activity. Beyond earlier studies on time-cost trade-off problem, in MRCTCT problem, resource requirements of each execution mo...

متن کامل

Efficient Selection of Design Parameters in Multi-Objective Economic-Statistical Model of Attribute C Control Chart

Control chart is the most well-known chart to monitor the number of nonconformities per inspection unit where each sample consists of constant size. Generally, the design of a control chart requires determination of sample size, sampling interval, and control limits width. Optimally selecting these parameters depends on several process parameters, which have been considered from statistical and...

متن کامل

Cost-sensitive Naïve Bayes Classification of Uncertain Data

Data uncertainty is widespread in real-word applications. It has captured a lot of attention, but little job has been paid to the research of cost sensitive algorithm on uncertain data. The paper proposes a novel cost-sensitive Naïve Bayes algorithm CS-DTU for classifying and predicting uncertain datasets. In the paper, we apply probability and statistics theory on uncertain data model, define ...

متن کامل

Feature selection using genetic algorithm for classification of schizophrenia using fMRI data

In this paper we propose a new method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging (fMRI) data. In the preprocessing step, the number of fMRI time points is reduced using principal component analysis (PCA). Then, independent component analysis (ICA) is used for further data analysis. It estimates independent components (ICs) of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Sci.

دوره 222  شماره 

صفحات  -

تاریخ انتشار 2013